27 research outputs found
Unsupervised Human Action Detection by Action Matching
We propose a new task of unsupervised action detection by action matching.
Given two long videos, the objective is to temporally detect all pairs of
matching video segments. A pair of video segments are matched if they share the
same human action. The task is category independent---it does not matter what
action is being performed---and no supervision is used to discover such video
segments. Unsupervised action detection by action matching allows us to align
videos in a meaningful manner. As such, it can be used to discover new action
categories or as an action proposal technique within, say, an action detection
pipeline. Moreover, it is a useful pre-processing step for generating video
highlights, e.g., from sports videos.
We present an effective and efficient method for unsupervised action
detection. We use an unsupervised temporal encoding method and exploit the
temporal consistency in human actions to obtain candidate action segments. We
evaluate our method on this challenging task using three activity recognition
benchmarks, namely, the MPII Cooking activities dataset, the THUMOS15 action
detection benchmark and a new dataset called the IKEA dataset. On the MPII
Cooking dataset we detect action segments with a precision of 21.6% and recall
of 11.7% over 946 long video pairs and over 5000 ground truth action segments.
Similarly, on THUMOS dataset we obtain 18.4% precision and 25.1% recall over
5094 ground truth action segment pairs.Comment: IEEE International Conference on Computer Vision and Pattern
Recognition CVPR 2017 Workshop
Evaluation of Object Detection Proposals Under Condition Variations
Object detection is a fundamental task in many computer vision applications,
therefore the importance of evaluating the quality of object detection is well
acknowledged in this domain. This process gives insight into the capabilities
of methods in handling environmental changes. In this paper, a new method for
object detection is introduced that combines the Selective Search and
EdgeBoxes. We tested these three methods under environmental variations. Our
experiments demonstrate the outperformance of the combination method under
illumination and view point variations.Comment: 2 pages, 6 figures, CVPR Workshop, 201
Bags of Affine Subspaces for Robust Object Tracking
We propose an adaptive tracking algorithm where the object is modelled as a
continuously updated bag of affine subspaces, with each subspace constructed
from the object's appearance over several consecutive frames. In contrast to
linear subspaces, affine subspaces explicitly model the origin of subspaces.
Furthermore, instead of using a brittle point-to-subspace distance during the
search for the object in a new frame, we propose to use a subspace-to-subspace
distance by representing candidate image areas also as affine subspaces.
Distances between subspaces are then obtained by exploiting the non-Euclidean
geometry of Grassmann manifolds. Experiments on challenging videos (containing
object occlusions, deformations, as well as variations in pose and
illumination) indicate that the proposed method achieves higher tracking
accuracy than several recent discriminative trackers.Comment: in International Conference on Digital Image Computing: Techniques
and Applications, 201
Action Recognition: From Static Datasets to Moving Robots
Deep learning models have achieved state-of-the- art performance in
recognizing human activities, but often rely on utilizing background cues
present in typical computer vision datasets that predominantly have a
stationary camera. If these models are to be employed by autonomous robots in
real world environments, they must be adapted to perform independently of
background cues and camera motion effects. To address these challenges, we
propose a new method that firstly generates generic action region proposals
with good potential to locate one human action in unconstrained videos
regardless of camera motion and then uses action proposals to extract and
classify effective shape and motion features by a ConvNet framework. In a range
of experiments, we demonstrate that by actively proposing action regions during
both training and testing, state-of-the-art or better performance is achieved
on benchmarks. We show the outperformance of our approach compared to the
state-of-the-art in two new datasets; one emphasizes on irrelevant background,
the other highlights the camera motion. We also validate our action recognition
method in an abnormal behavior detection scenario to improve workplace safety.
The results verify a higher success rate for our method due to the ability of
our system to recognize human actions regardless of environment and camera
motion
Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching
A convenient way of dealing with image sets is to represent them as points on Grassmannian manifolds. While several recent studies explored the applicability of discriminant analysis on such manifolds, the conventional formalism of discriminant analysis suffers from not considering the local structure of the data. We propose a discriminant analysis approach on Grassmannian manifolds, based on a graphembedding framework. We show that by introducing within class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited. Experiments on several image datasets (PIE, BANCA, MoBo, ETH-80)show that the proposed algorithm obtains considerable improvements in discrimination accuracy, in comparison to three recent methods: Grassmann Discriminant Analysis (GDA), Kernel GDA, and the kernel version of Affine Hull Image Set Distance. We further propose a Grassmannian kernel, based on canonical correlation between subspaces, which can increase discrimination accuracy when used in combination with previous Grassmannian kernels
Kernel analysis on Grassmann manifolds for action recognition
Modelling video sequences by subspaces has recently shown promise for recognising human actions. Subspaces are able to accommodate the effects of various image variations and can capture the dynamic properties of actions. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold. Inference on manifold spaces usually is achieved by embedding the manifolds in higher dimensional Euclidean spaces. In this paper, we instead propose to embed the Grassmann manifolds into reproducing kernel Hilbert spaces and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we propose graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that the proposed approach obtains marked improvements in discrimination accuracy in comparison to several state-of-the-art methods, such as the kernel version of affine hull image-set distance, tensor canonical correlation analysis, spatial-temporal words and hierarchy of discriminative space-time neighbourhood features
Assessment of an Unshielded Electron Field Diode Dosimeter for Beam Scanning in Small-to Medium-Sized 6 MV Photon Fields
Abstract Introduction Radiotherapy planning systems require many percentage depth dose (PDD) and profile measurements and there are various dosimeters that can be used to obtain these scans. As dose perturbation is particularly troublesome in smaller photon fields, using a low-perturbation, unshielded electron field diode (EFD) in these fields is of interest. The aim of this work was to investigate the suitability of an unshielded diode for beam scanning in 3×3 cm 2 , 5×5 cm 2 , and 10×10 cm 2 , 6 MV fields. Materials and Methods An EFD was used for all the scans. For comparison, in profile measurements, a tungsten-shielded photon field diode (PFD) was also used. PDDs were measured using the PFD and an RK ionization chamber. Results Very good agreement (0.4%) was found between the PDDs measured with EFD and PFD for the two larger fields. However, the difference between them exceeded 1.0% slightly for the smallest field, which may be attributed to the effect of the larger PFD perturbation. The RK chamber PDDs around 10 cm depth were 1-2% lower than those measured with the diodes. There was good agreement (<1 mm) between EFD-and PFD-measured penumbra widths. Conclusion The EFD generally agrees well with the PFD and may even perform better in smaller fields
What would you do? Acting by learning to predict
We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. We evaluate our approach on two table-top, object manipulation tasks and demonstrate generalisation to previously unseen states. Our approach reduces the priors required to implement a robot task learning system compared with the existing approaches of Learning from Demonstration, Reinforcement Learning and Inverse Reinforcement Learning